{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oHO_dChM-Ykl"
      },
      "source": [
        "# 🍳 CAMEL Cookbook: Building a Collaborative AI Research Society\n",
        "## Claude 4 + Azure OpenAI Collaboration for ARENA AI Alignment Research"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jHGN9nIv-bIf"
      },
      "source": [
        "<div class=\"align-center\">\n",
        "  <a href=\"https://www.camel-ai.org/\"><img src=\"https://i.postimg.cc/KzQ5rfBC/button.png\"width=\"150\"></a>\n",
        "  <a href=\"https://discord.camel-ai.org\"><img src=\"https://i.postimg.cc/L4wPdG9N/join-2.png\"  width=\"150\"></a></a>\n",
        "  \n",
        "⭐ <i>Star us on [*Github*](https://github.com/camel-ai/camel), join our [*Discord*](https://discord.camel-ai.org) or follow our [*X*](https://x.com/camelaiorg)\n",
        "</div>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QSGuiwh--6ce"
      },
      "source": [
        "## 📋 Overview\n",
        "\n",
        "This cookbook demonstrates how to create a collaborative multi-agent society using CAMEL-AI, bringing together Claude 4 and Azure OpenAI models to research AI alignment topics from the ARENA curriculum.\n",
        "\n",
        "Our society consists of 4 specialized AI researchers with distinct personas and expertise areas."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fy_Q2Vp3_RgL"
      },
      "source": [
        "## So, Let's catapault our way right in 🧚"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XxC8Xe40_f6x"
      },
      "source": [
        "## 🛠️ Dependencies and Setup\n",
        "First, let's install the required dependencies and handle the notebook environment:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "EKduVbY3XSYV",
        "outputId": "61642b98-baed-4fa7-a7a8-561e380c020d"
      },
      "outputs": [],
      "source": [
        "!pip install camel-ai['0.2.64'] anthropic"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 45,
      "metadata": {
        "id": "oZlcerEn_jtz"
      },
      "outputs": [],
      "source": [
        "import textwrap\n",
        "import os\n",
        "from getpass import getpass\n",
        "from typing import Dict, Any\n",
        "\n",
        "from camel.agents import ChatAgent\n",
        "from camel.messages import BaseMessage\n",
        "from camel.models import ModelFactory\n",
        "from camel.models.azure_openai_model import AzureOpenAIModel\n",
        "from camel.tasks import Task\n",
        "from camel.toolkits import FunctionTool, SearchToolkit\n",
        "from camel.types import ModelPlatformType, ModelType\n",
        "from camel.societies.workforce import Workforce"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VVhZfj7YaIJL"
      },
      "source": [
        "Prepare API keys: Azure OpenAI, Claude (Anthropic), and optionally Google Search\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 46,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "hMPCR_PyXzfZ",
        "outputId": "7f962858-c402-474a-92e7-7a58ab5906f6"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Setup optional API Keys for Google search functionality?(y/n): n\n"
          ]
        }
      ],
      "source": [
        "# Ensuring API Keys are set\n",
        "if not os.getenv(\"AZURE_OPENAI_API_KEY\"):\n",
        "  print(\"AZURE OPENAI API KEY is required to proceed.\")\n",
        "  azure_openai_api_key = getpass(\"Enter your Azure OpenAI API Key: \")\n",
        "  os.environ[\"AZURE_OPENAI_API_KEY\"] = azure_openai_api_key\n",
        "\n",
        "if not os.getenv(\"AZURE_OPENAI_ENDPOINT\"):\n",
        "  print(\"Azure OpenAI Endpoint is required to proceed.\")\n",
        "  azure_openai_endpoint = input(\"Enter your Azure OpenAI Endpoint: \")\n",
        "  os.environ[\"AZURE_OPENAI_ENDPOINT\"] = azure_openai_endpoint\n",
        "\n",
        "if not os.getenv(\"ANTHROPIC_API_KEY\"):\n",
        "  print(\"ANTHROPIC API KEY is required to proceed.\")\n",
        "  anthropic_api_key = getpass(\"Enter your Anthropic API Key: \")\n",
        "  os.environ[\"ANTHROPIC_API_KEY\"] = anthropic_api_key\n",
        "\n",
        "optional_keys_setup = input(\"Setup optional API Keys for Google search functionality?(y/n): \").lower()\n",
        "\n",
        "if \"y\" in optional_keys_setup:\n",
        "  if not os.getenv(\"GOOGLE_API_KEY\"):\n",
        "    print(\"[OPTIONAL] Provide a GOOGLE CLOUD API KEY for google search.\")\n",
        "    google_api_key = getpass(\"Enter your Google API KEY: \")\n",
        "    os.environ[\"GOOGLE_API_KEY\"] = google_api_key\n",
        "\n",
        "  if not os.getenv(\"SEARCH_ENGINE_ID\"):\n",
        "    print(\"[OPTIONAL] Provide a search engine ID for google search.\")\n",
        "    search_engine_id = getpass(\"Enter your Search Engine ID: \")\n",
        "    os.environ[\"SEARCH_ENGINE_ID\"] = search_engine_id\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EXBs1Cy-_nRM"
      },
      "source": [
        "### What this does:\n",
        "\n",
        "- Imports all necessary CAMEL-AI components\n",
        "- Handles async operations for notebook environments\n",
        "- Sets up typing hints for better code clarity\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bNK_5jFi_rUe"
      },
      "source": [
        "## 🏗️ Core Society Class Structure\n",
        "Let's define our main research society class:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 47,
      "metadata": {
        "id": "A_SEy-2i_t1j"
      },
      "outputs": [],
      "source": [
        "class ARENAResearchSociety:\n",
        "    \"\"\"\n",
        "    A collaborative CAMEL society between Claude 4 and Azure OpenAI\n",
        "    for researching the ARENA AI alignment curriculum.\n",
        "    \"\"\"\n",
        "\n",
        "    def __init__(self):\n",
        "        self.workforce = None\n",
        "        self.setup_api_keys()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mfO3_w-n_wp7"
      },
      "source": [
        "### What this does:\n",
        "\n",
        "- Creates the main class that will orchestrate our AI research society\n",
        "- Initializes with API key setup to ensure proper authentication\n",
        "- Prepares the workforce variable for later agent assignment"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9XnEUVeq_z2N"
      },
      "source": [
        "## 🔑 API Configuration Management\n",
        "Configure all necessary API keys and endpoints:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 48,
      "metadata": {
        "id": "LRIpFZkg_2Dh"
      },
      "outputs": [],
      "source": [
        "def setup_api_keys(self):\n",
        "    \"\"\"Setup API keys for Azure OpenAI and Claude\"\"\"\n",
        "    print(\"🔧 Setting up API keys...\")\n",
        "\n",
        "    # Azure OpenAI configuration\n",
        "    if not os.getenv(\"AZURE_OPENAI_API_KEY\"):\n",
        "        azure_api_key = getpass(\"Please input your Azure OpenAI API key: \")\n",
        "        os.environ[\"AZURE_OPENAI_API_KEY\"] = azure_api_key\n",
        "\n",
        "    if not os.getenv(\"AZURE_OPENAI_ENDPOINT\"):\n",
        "        azure_endpoint = getpass(\"Please input your Azure OpenAI endpoint: \")\n",
        "        os.environ[\"AZURE_OPENAI_ENDPOINT\"] = azure_endpoint\n",
        "\n",
        "    if not os.getenv(\"AZURE_DEPLOYMENT_NAME\"):\n",
        "        deployment_name = getpass(\"Please input your Azure deployment name (e.g., div-o4-mini): \")\n",
        "        os.environ[\"AZURE_DEPLOYMENT_NAME\"] = deployment_name\n",
        "\n",
        "    # Set OPENAI_API_KEY for compatibility (use Azure key)\n",
        "    os.environ[\"OPENAI_API_KEY\"] = os.getenv(\"AZURE_OPENAI_API_KEY\")\n",
        "\n",
        "    # Claude API configuration\n",
        "    if not os.getenv(\"ANTHROPIC_API_KEY\"):\n",
        "        claude_api_key = getpass(\"Please input your Claude API key: \")\n",
        "        os.environ[\"ANTHROPIC_API_KEY\"] = claude_api_key\n",
        "\n",
        "    # Optional: Google Search for research capabilities\n",
        "    if not os.getenv(\"GOOGLE_API_KEY\"):\n",
        "        try:\n",
        "            google_api_key = getpass(\"Please input your Google API key (optional, press Enter to skip): \")\n",
        "            if google_api_key:\n",
        "                os.environ[\"GOOGLE_API_KEY\"] = google_api_key\n",
        "                search_engine_id = getpass(\"Please input your Search Engine ID: \")\n",
        "                if search_engine_id:  # Only set if provided\n",
        "                    os.environ[\"SEARCH_ENGINE_ID\"] = search_engine_id\n",
        "                else:\n",
        "                    print(\"⚠️ Search Engine ID not provided. Search functionality will be disabled.\")\n",
        "        except KeyboardInterrupt:\n",
        "            print(\"Skipping Google Search setup...\")\n",
        "\n",
        "    print(\"✅ API keys configured!\")\n",
        "ARENAResearchSociety.setup_api_keys = setup_api_keys"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vWpYeU01_5PY"
      },
      "source": [
        "### What this does:\n",
        "\n",
        "- Securely collects API credentials using getpass (hidden input)\n",
        "- Supports Azure OpenAI, Claude (Anthropic), and optional Google Search\n",
        "- Sets environment variables for seamless integration\n",
        "- Provides graceful fallbacks for optional components"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "N4oDHnlN_-OL"
      },
      "source": [
        "## 🤖 Azure OpenAI Agent Creation\n",
        "Create specialized Azure OpenAI agents with custom personas:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 49,
      "metadata": {
        "id": "jZrn1AutACkN"
      },
      "outputs": [],
      "source": [
        "def create_azure_agent(self, role_name: str, persona: str, specialization: str) -> ChatAgent:\n",
        "    \"\"\"Create an Azure OpenAI agent with specific role and persona\"\"\"\n",
        "\n",
        "    msg_content = textwrap.dedent(f\"\"\"\n",
        "    You are {role_name}, a researcher specializing in AI alignment and safety.\n",
        "\n",
        "    Your persona: {persona}\n",
        "\n",
        "    Your specialization: {specialization}\n",
        "\n",
        "    You are part of a collaborative research team studying the ARENA AI alignment curriculum.\n",
        "    ARENA focuses on practical AI safety skills including:\n",
        "    - Mechanistic interpretability\n",
        "    - Reinforcement learning from human feedback (RLHF)\n",
        "    - AI governance and policy\n",
        "    - Robustness and adversarial examples\n",
        "\n",
        "    When collaborating:\n",
        "    1. Provide detailed, technical analysis\n",
        "    2. Reference specific ARENA modules when relevant\n",
        "    3. Build upon other agents' findings\n",
        "    4. Maintain academic rigor while being accessible\n",
        "    5. Always cite sources and provide evidence for claims\n",
        "    \"\"\").strip()\n",
        "\n",
        "    sys_msg = BaseMessage.make_assistant_message(\n",
        "        role_name=role_name,\n",
        "        content=msg_content,\n",
        "    )\n",
        "\n",
        "    # Configure Azure OpenAI model with correct API version for o4-mini\n",
        "    model = AzureOpenAIModel(\n",
        "        model_type=ModelType.GPT_4O_MINI,\n",
        "        api_key=os.getenv(\"AZURE_OPENAI_API_KEY\"),\n",
        "        url=os.getenv(\"AZURE_OPENAI_ENDPOINT\"),\n",
        "        api_version=\"2025-01-01-preview\",  # Updated to support o4-mini\n",
        "        azure_deployment_name=os.getenv(\"AZURE_DEPLOYMENT_NAME\") or \"div-o4-mini\"\n",
        "    )\n",
        "\n",
        "    return ChatAgent(\n",
        "        system_message=sys_msg,\n",
        "        model=model,\n",
        "    )\n",
        "ARENAResearchSociety.create_azure_agent = create_azure_agent"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "H6WJouBHAFp3"
      },
      "source": [
        "### What this does:\n",
        "\n",
        "- Creates customizable Azure OpenAI agents with specific roles and expertise\n",
        "- Embeds ARENA curriculum knowledge into each agent's system prompt\n",
        "- Uses the latest API version compatible with o4-mini model\n",
        "- Returns a fully configured ChatAgent ready for collaboration"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZrGBBCAZAIsx"
      },
      "source": [
        "## 🧠 Claude Agent Creation\n",
        "Create Claude agents with complementary capabilities:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 50,
      "metadata": {
        "id": "PKA9hwsmALdj"
      },
      "outputs": [],
      "source": [
        "def create_claude_agent(self, role_name: str, persona: str, specialization: str, tools=None) -> ChatAgent:\n",
        "    \"\"\"Create a Claude agent with specific role and persona\"\"\"\n",
        "\n",
        "    msg_content = textwrap.dedent(f\"\"\"\n",
        "    You are {role_name}, a researcher specializing in AI alignment and safety.\n",
        "\n",
        "    Your persona: {persona}\n",
        "\n",
        "    Your specialization: {specialization}\n",
        "\n",
        "    You are part of a collaborative research team studying the ARENA AI alignment curriculum.\n",
        "    ARENA focuses on practical AI safety skills including:\n",
        "    - Mechanistic interpretability\n",
        "    - Reinforcement learning from human feedback (RLHF)\n",
        "    - AI governance and policy\n",
        "    - Robustness and adversarial examples\n",
        "\n",
        "    When collaborating:\n",
        "    1. Provide thorough, nuanced analysis\n",
        "    2. Consider ethical implications and long-term consequences\n",
        "    3. Synthesize information from multiple perspectives\n",
        "    4. Ask probing questions to deepen understanding\n",
        "    5. Connect concepts across different AI safety domains\n",
        "    \"\"\").strip()  # Remove trailing whitespace\n",
        "\n",
        "    sys_msg = BaseMessage.make_assistant_message(\n",
        "        role_name=role_name,\n",
        "        content=msg_content,\n",
        "    )\n",
        "\n",
        "    # Configure Claude model\n",
        "    model = ModelFactory.create(\n",
        "        model_platform=ModelPlatformType.ANTHROPIC,\n",
        "        model_type=ModelType.CLAUDE_3_5_SONNET,\n",
        "    )\n",
        "\n",
        "    agent = ChatAgent(\n",
        "        system_message=sys_msg,\n",
        "        model=model,\n",
        "        tools=tools or [],\n",
        "    )\n",
        "\n",
        "    return agent\n",
        "ARENAResearchSociety.create_claude_agent = create_claude_agent"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fN9_ic0lANV3"
      },
      "source": [
        "### What this does:\n",
        "\n",
        "- Creates Claude agents with nuanced, philosophical thinking capabilities\n",
        "- Emphasizes ethical considerations and long-term thinking\n",
        "- Supports optional tool integration (like search capabilities)\n",
        "- Uses Claude 3.5 Sonnet for advanced reasoning\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JvxlJ7RjARi5"
      },
      "source": [
        "## 👥 Workforce Assembly\n",
        "Bring together all agents into a collaborative workforce:\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 51,
      "metadata": {
        "id": "X_u3REt4ATk9"
      },
      "outputs": [],
      "source": [
        "def create_research_workforce(self):\n",
        "    \"\"\"Create the collaborative research workforce\"\"\"\n",
        "    print(\"🏗️ Creating ARENA Research Society...\")\n",
        "\n",
        "    # Setup search tools for the lead researcher (only if properly configured)\n",
        "    search_tools = []\n",
        "    if os.getenv(\"GOOGLE_API_KEY\") and os.getenv(\"SEARCH_ENGINE_ID\"):\n",
        "        try:\n",
        "            search_toolkit = SearchToolkit()\n",
        "            search_tools = [\n",
        "                FunctionTool(search_toolkit.search_google),\n",
        "            ]\n",
        "            print(\"🔍 Search tools enabled for lead researcher\")\n",
        "        except Exception as e:\n",
        "            print(f\"⚠️ Search tools disabled due to configuration issue: {e}\")\n",
        "            search_tools = []\n",
        "    else:\n",
        "        print(\"🔍 Search tools disabled - missing API keys\")\n",
        "\n",
        "    # Create Claude agents\n",
        "    claude_lead = self.create_claude_agent(\n",
        "        role_name=\"Dr. Claude Alignment\",\n",
        "        persona=\"A thoughtful, methodical researcher who excels at synthesizing complex information and identifying key insights. Known for asking the right questions and seeing the bigger picture. Works with existing knowledge when search tools are unavailable.\",\n",
        "        specialization=\"AI safety frameworks, mechanistic interpretability, and curriculum analysis\",\n",
        "        tools=search_tools\n",
        "    )\n",
        "\n",
        "    claude_ethicist = self.create_claude_agent(\n",
        "        role_name=\"Prof. Claude Ethics\",\n",
        "        persona=\"A philosophical thinker who deeply considers the ethical implications and long-term consequences of AI development. Bridges technical concepts with societal impact.\",\n",
        "        specialization=\"AI governance, policy implications, and ethical frameworks in AI alignment\"\n",
        "    )\n",
        "\n",
        "    # Create Azure OpenAI agents\n",
        "    azure_technical = self.create_azure_agent(\n",
        "        role_name=\"Dr. Azure Technical\",\n",
        "        persona=\"A detail-oriented technical expert who dives deep into implementation specifics and mathematical foundations. Excellent at breaking down complex algorithms.\",\n",
        "        specialization=\"RLHF implementation, robustness techniques, and technical deep-dives\"\n",
        "    )\n",
        "\n",
        "    azure_practical = self.create_azure_agent(\n",
        "        role_name=\"Dr. Azure Practical\",\n",
        "        persona=\"A pragmatic researcher focused on real-world applications and practical implementation. Bridges theory with practice.\",\n",
        "        specialization=\"Practical AI safety applications, training methodologies, and hands-on exercises\"\n",
        "    )\n",
        "\n",
        "    # Configure coordinator and task agents to use Azure OpenAI with correct API version\n",
        "    coordinator_agent_kwargs = {\n",
        "        'model': AzureOpenAIModel(\n",
        "            model_type=ModelType.GPT_4O_MINI,\n",
        "            api_key=os.getenv(\"AZURE_OPENAI_API_KEY\"),\n",
        "            url=os.getenv(\"AZURE_OPENAI_ENDPOINT\"),\n",
        "            api_version=\"2025-01-01-preview\",\n",
        "            azure_deployment_name=os.getenv(\"AZURE_DEPLOYMENT_NAME\") or \"div-o4-mini\"\n",
        "        ),\n",
        "        'token_limit': 8000\n",
        "    }\n",
        "\n",
        "    task_agent_kwargs = {\n",
        "        'model': AzureOpenAIModel(\n",
        "            model_type=ModelType.GPT_4O_MINI,\n",
        "            api_key=os.getenv(\"AZURE_OPENAI_API_KEY\"),\n",
        "            url=os.getenv(\"AZURE_OPENAI_ENDPOINT\"),\n",
        "            api_version=\"2025-01-01-preview\",\n",
        "            azure_deployment_name=os.getenv(\"AZURE_DEPLOYMENT_NAME\") or \"div-o4-mini\"\n",
        "        ),\n",
        "        'token_limit': 16000\n",
        "    }\n",
        "\n",
        "    # Create the workforce with proper configuration\n",
        "    self.workforce = Workforce(\n",
        "        'ARENA AI Alignment Research Society',\n",
        "        coordinator_agent_kwargs=coordinator_agent_kwargs,\n",
        "        task_agent_kwargs=task_agent_kwargs\n",
        "    )\n",
        "\n",
        "    # Add agents with descriptive roles\n",
        "    self.workforce.add_single_agent_worker(\n",
        "        'Dr. Claude Alignment (Lead Researcher) - Synthesizes information, leads research direction, and provides comprehensive analysis based on existing knowledge',\n",
        "        worker=claude_lead,\n",
        "    ).add_single_agent_worker(\n",
        "        'Prof. Claude Ethics (Ethics & Policy Specialist) - Analyzes ethical implications, policy considerations, and societal impact of AI alignment research',\n",
        "        worker=claude_ethicist,\n",
        "    ).add_single_agent_worker(\n",
        "        'Dr. Azure Technical (Technical Deep-Dive Specialist) - Provides detailed technical analysis, mathematical foundations, and implementation specifics',\n",
        "        worker=azure_technical,\n",
        "    ).add_single_agent_worker(\n",
        "        'Dr. Azure Practical (Applied Research Specialist) - Focuses on practical applications, training methodologies, and hands-on implementation guidance',\n",
        "        worker=azure_practical,\n",
        "    )\n",
        "\n",
        "    print(\"✅ ARENA Research Society created with 4 specialized agents!\")\n",
        "    return self.workforce\n",
        "ARENAResearchSociety.create_research_workforce = create_research_workforce"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PVUoO9qgAXqH"
      },
      "source": [
        "### What this does:\n",
        "\n",
        "- Creates 4 specialized researchers: 2 Claude agents + 2 Azure OpenAI agents\n",
        "- Each agent has distinct personalities and expertise areas\n",
        "- Configures search tools for the lead researcher (when available)\n",
        "- Sets up proper workforce coordination using Azure OpenAI models\n",
        "- Creates a balanced team covering technical, practical, and ethical perspectives"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ewoOVtG1AbHk"
      },
      "source": [
        "## 📋 Research Task Creation\n",
        "Define structured research tasks for the collaborative team:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 52,
      "metadata": {
        "id": "qBX8N-ZrAd7j"
      },
      "outputs": [],
      "source": [
        "def create_research_task(self, research_topic: str, specific_questions: str = None) -> Task:\n",
        "    \"\"\"Create a research task for the ARENA curriculum\"\"\"\n",
        "\n",
        "    arena_context = {\n",
        "        \"curriculum_info\": \"ARENA (AI Research and Education Nexus for Alignment) is a comprehensive AI safety curriculum\",\n",
        "        \"focus_areas\": [\n",
        "            \"Mechanistic Interpretability - Understanding how neural networks work internally\",\n",
        "            \"Reinforcement Learning from Human Feedback (RLHF) - Training AI systems to be helpful and harmless\",\n",
        "            \"AI Governance - Policy, regulation, and coordination for AI safety\",\n",
        "            \"Robustness & Adversarial Examples - Making AI systems robust to attacks and edge cases\"\n",
        "        ],\n",
        "        \"emphasis\": \"practical skills, hands-on exercises, and real-world applications\",\n",
        "        \"website\": \"https://www.arena.education/curriculum\"\n",
        "    }\n",
        "\n",
        "    # Check if search tools are available\n",
        "    has_search = bool(os.getenv(\"GOOGLE_API_KEY\") and os.getenv(\"SEARCH_ENGINE_ID\"))\n",
        "\n",
        "    base_content = f\"\"\"\n",
        "    Research Topic: {research_topic}\n",
        "\n",
        "    Please conduct a comprehensive collaborative research analysis on this topic in relation to the ARENA AI alignment curriculum.\n",
        "\n",
        "    {'Note: Search tools are available for gathering latest information.' if has_search else 'Note: Analysis will be based on existing knowledge as search tools are not available.'}\n",
        "\n",
        "    Research Process:\n",
        "    1. **Information Gathering** - {'Collect relevant information about the topic, including latest developments' if has_search else 'Analyze the topic based on existing knowledge and understanding'}\n",
        "    2. **Technical Analysis** - Provide detailed technical breakdown and mathematical foundations\n",
        "    3. **Practical Applications** - Explore how this relates to hands-on ARENA exercises and real-world implementation\n",
        "    4. **Ethical Considerations** - Analyze policy implications and ethical frameworks\n",
        "    5. **Synthesis** - Combine all perspectives into actionable insights and recommendations\n",
        "\n",
        "    Expected Deliverables:\n",
        "    - Comprehensive analysis from each specialist perspective\n",
        "    - Identification of key concepts and their relationships\n",
        "    - Practical implementation guidance\n",
        "    - Policy and ethical considerations\n",
        "    - Recommendations for further research or curriculum development\n",
        "    \"\"\"\n",
        "\n",
        "    if specific_questions:\n",
        "        base_content += f\"\\n\\nSpecific Research Questions:\\n{specific_questions}\"\n",
        "\n",
        "    return Task(\n",
        "        content=base_content.strip(),\n",
        "        additional_info=arena_context,\n",
        "        id=\"arena_research_001\",\n",
        "    )\n",
        "ARENAResearchSociety.create_research_task = create_research_task"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GpuXTYOkAhOi"
      },
      "source": [
        "### What this does:\n",
        "\n",
        "- Creates structured research tasks with clear objectives and deliverables\n",
        "- Adapts task content based on available tools (search vs. knowledge-based)\n",
        "- Includes ARENA curriculum context for focused analysis\n",
        "- Supports custom research questions for specialized investigations"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ktGiQU9PAkL9"
      },
      "source": [
        "## 🔬 Research Execution\n",
        "Execute collaborative research sessions:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 53,
      "metadata": {
        "id": "32IPUR3iAmVY"
      },
      "outputs": [],
      "source": [
        "def run_research(self, research_topic: str, specific_questions: str = None):\n",
        "    \"\"\"Run a collaborative research session\"\"\"\n",
        "    if not self.workforce:\n",
        "        self.create_research_workforce()\n",
        "\n",
        "    print(f\"🔬 Starting collaborative research on: {research_topic}\")\n",
        "    print(\"=\" * 60)\n",
        "\n",
        "    task = self.create_research_task(research_topic, specific_questions)\n",
        "    processed_task = self.workforce.process_task(task)\n",
        "\n",
        "    print(\"\\n\" + \"=\" * 60)\n",
        "    print(\"📊 RESEARCH RESULTS\")\n",
        "    print(\"=\" * 60)\n",
        "    print(processed_task.result)\n",
        "\n",
        "    return processed_task.result\n",
        "ARENAResearchSociety.run_research = run_research"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CNGecl-jAp2L"
      },
      "source": [
        "## What this does:\n",
        "\n",
        "- Orchestrates the entire research process\n",
        "- Creates the workforce if not already initialized\n",
        "- Processes tasks through the collaborative agent network\n",
        "- Returns formatted research results\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yoB1-obIAsBA"
      },
      "source": [
        " ## 🎯 Interactive Demo Interface\n",
        "Create an interactive interface for easy topic selection:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 59,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "luftv50cAy10",
        "outputId": "15020813-3448-4251-feec-82343422f9a0"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "🔧 Setting up API keys...\n",
            "Please input your Google API key (optional, press Enter to skip): ··········\n",
            "✅ API keys configured!\n",
            "🎯 ARENA AI Alignment Research Society\n",
            "Choose a research topic or provide your own:\n",
            "\n",
            "1. Mechanistic Interpretability in Large Language Models\n",
            "2. RLHF Implementation Challenges and Best Practices\n",
            "3. AI Governance Frameworks for Emerging Technologies\n",
            "4. Custom research topic\n",
            "\n",
            "Enter your choice (1-4): 1\n",
            "🏗️ Creating ARENA Research Society...\n",
            "🔍 Search tools disabled - missing API keys\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "WARNING:camel.camel.societies.workforce.workforce:No new_worker_agent_kwargs provided. Workers created at runtime will use default ChatAgent settings with SearchToolkit, CodeExecutionToolkit, and ThinkingToolkit. To customize runtime worker creation, pass a dictionary with ChatAgent parameters, e.g.: {'model': your_model, 'tools': your_tools}. See ChatAgent documentation for all available options.\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "✅ ARENA Research Society created with 4 specialized agents!\n",
            "🔬 Starting collaborative research on: Mechanistic Interpretability in Large Language Models\n",
            "============================================================\n",
            "\u001b[33mWorker node 1ee352ae-ad84-4246-a184-add64875c6f8 (Dr. Claude Alignment (Lead Researcher) - Synthesizes information, leads research direction, and provides comprehensive analysis based on existing knowledge) get task arena_research_001.0: 1. (Dr. Claude Alignment) Conduct a comprehensive information‐gathering and synthesis on mechanistic interpretability in LLMs as it relates to the ARENA AI alignment curriculum: identify key concepts and relationships, map them to alignment strategies, and draft high‐level recommendations for further research and curriculum development.\u001b[39m\n",
            "======\n",
            "\u001b[32mReply from Worker node 1ee352ae-ad84-4246-a184-add64875c6f8 (Dr. Claude Alignment (Lead Researcher) - Synthesizes information, leads research direction, and provides comprehensive analysis based on existing knowledge):\u001b[39m\n",
            "\n",
            "\u001b[32mMechanistic Interpretability Analysis for ARENA Curriculum\n",
            "\n",
            "Key Concepts and Relationships:\n",
            "1. Feature Visualization\n",
            "- Understanding how individual neurons and layers represent information\n",
            "- Mapping activation patterns to semantic concepts\n",
            "- Techniques for visualizing network internals\n",
            "\n",
            "2. Circuit Analysis\n",
            "- Identifying and studying specific circuits within neural networks\n",
            "- Tracing information flow between components\n",
            "- Understanding computation patterns and transformations\n",
            "\n",
            "3. Attribution Methods\n",
            "- Techniques for determining which inputs influence specific outputs\n",
            "- Gradient-based approaches\n",
            "- Attention analysis in transformer architectures\n",
            "\n",
            "Alignment Strategy Mapping:\n",
            "1. Transparency\n",
            "- Using interpretability tools to understand model decision-making\n",
            "- Identifying potential failure modes and biases\n",
            "- Validating alignment with intended objectives\n",
            "\n",
            "2. Safety Monitoring\n",
            "- Developing methods to detect unwanted behaviors\n",
            "- Creating interpretability-based safety metrics\n",
            "- Establishing monitoring frameworks\n",
            "\n",
            "3. Iterative Refinement\n",
            "- Using interpretability insights to improve model training\n",
            "- Adjusting architectures based on circuit analysis\n",
            "- Refining alignment techniques based on empirical findings\n",
            "\n",
            "Recommendations:\n",
            "1. Curriculum Development\n",
            "- Integrate hands-on interpretability exercises\n",
            "- Focus on practical tools and techniques\n",
            "- Include case studies of successful interpretability research\n",
            "\n",
            "2. Research Priorities\n",
            "- Develop better tools for scaling interpretability to larger models\n",
            "- Investigate connections between interpretability and robustness\n",
            "- Study how interpretability can inform alignment strategies\n",
            "\n",
            "3. Future Directions\n",
            "- Explore automated interpretability techniques\n",
            "- Investigate interpretability's role in governance\n",
            "- Develop standardized evaluation metrics\n",
            "\n",
            "Emphasis should be placed on connecting interpretability insights to practical alignment strategies and safety mechanisms.\u001b[39m\n",
            "======\u001b[33mWorker node 3181a376-331b-498a-95f3-b178f1d87ef9 (Dr. Azure Technical (Technical Deep-Dive Specialist) - Provides detailed technical analysis, mathematical foundations, and implementation specifics) get task arena_research_001.1: 2. (Dr. Azure Technical) Deliver a detailed technical analysis of mechanistic interpretability techniques—focusing on attention patterns and residual streams—including their mathematical foundations, algorithmic implementations, current limitations, and potential future directions.\u001b[39m\n",
            "======\n",
            "\u001b[32mReply from Worker node 3181a376-331b-498a-95f3-b178f1d87ef9 (Dr. Azure Technical (Technical Deep-Dive Specialist) - Provides detailed technical analysis, mathematical foundations, and implementation specifics):\u001b[39m\n",
            "\n",
            "\u001b[32m1. Overview\n",
            "This analysis dives into two core mechanistic interpretability techniques in transformer‐based LLMs—attention‐pattern analysis and residual‐stream analysis—covering their mathematical underpinnings, algorithmic implementations, limitations, and future directions.\n",
            "\n",
            "2. Attention Patterns\n",
            "2.1 Mathematical Foundations\n",
            "- Self‐Attention Formula:\n",
            "  Let X∈R^{n×d} be input token embeddings. We define queries Q=XW_Q, keys K=XW_K, and values V=XW_V, where W_Q,W_K,W_V ∈R^{d×d} are learned matrices. Attention weights A∈R^{n×n} are computed as:  \n",
            "    A = softmax(QK^T/√d_k)  \n",
            "  The output is O = A·V.\n",
            "- Multi‐Head Decomposition:\n",
            "  With h heads, W_Q splits into {W_Q^i}, etc. Outputs O^i=A^iV^i are concatenated and linearly projected.  \n",
            "- Eigen‐Decomposition & SVD:\n",
            "  One can spectral‐analyze the attention tensor A to identify dominant attention modes (e.g., diagonal dominance vs. key‐token clustering).\n",
            "\n",
            "2.2 Algorithmic Implementations\n",
            "- Attention Rollout (Abnar & Zuidema 2020):\n",
            "  Recursively aggregate head matrices to compute effective attention from layer 1 to L: R^(ℓ)=A^(ℓ)R^(ℓ−1), R^(0)=I.\n",
            "- Attention Flow (Vig 2019):\n",
            "  Treat attention as a flow network; compute max‐flow/min‐cut to find critical token pathways.\n",
            "- Head Attribution (Wiegreffe & Pinter 2019):\n",
            "  Ablate individual heads (zero out W_Q,W_K,W_V) and measure Δ loss or Δ feature activation to rank head importance.\n",
            "\n",
            "2.3 Current Limitations\n",
            "- Correlation vs. Causation: High attention weight does not imply causal importance.  \n",
            "- Sparsity & Noise: Many attention heads specialize on trivial patterns (e.g., next‐token copy), inflating interpretive noise.  \n",
            "- Scalability: Quadratic cost O(n^2) in sequence length makes analysis in long‐context models expensive.\n",
            "\n",
            "2.4 Future Directions\n",
            "- Causal Mediation Analysis (Elhage et al. 2021): Use interventionist techniques to isolate head contributions.  \n",
            "- Sparse‐Attention Probes: Develop approximate algorithms (e.g., Linformer, Performers) to scale pattern extraction to >10K tokens.  \n",
            "- Learned Pattern Libraries: Automatically cluster heads by functional role (e.g., copy, syntax, coreference).\n",
            "\n",
            "3. Residual Streams\n",
            "3.1 Mathematical Foundations\n",
            "- Residual Path Equation:\n",
            "  At transformer block ℓ, input h^(ℓ) is updated as:\n",
            "    h' = h^(ℓ) + Attention(h^(ℓ))  \n",
            "    h^(ℓ+1) = h' + MLP(h')\n",
            "  Thus the residual stream accumulates contributions of all sublayers across blocks.\n",
            "- Linearized Path Analysis:\n",
            "  Approximate non‐linear sublayers via first‐order Taylor expansions around a reference input to decompose contributions linearly.\n",
            "\n",
            "3.2 Algorithmic Implementations\n",
            "- Causal Tracing (Meng et al. 2022):\n",
            "  Replace residual activations at a given token and layer with a baseline, then measure downstream effect on logits.  \n",
            "- Principal Component Probing (Dalvi et al. 2023):\n",
            "  Perform PCA on residual activations over many inputs; correlate principal components with semantic features via linear probes.\n",
            "- Neuron Clamping (Olah et al. 2020):\n",
            "  Zero or amplify individual neuron activations in the residual stream to test their causal effect on model outputs.\n",
            "\n",
            "3.3 Current Limitations\n",
            "- Non‐Linearity: MLPs introduce non‐linear couplings; linear path decompositions can misattribute cross‐term effects.  \n",
            "- Interactions Across Layers: The residual stream is a superposition; isolating a single circuit among O(L×d) pathways is combinatorial.  \n",
            "- Compute & Memory Overhead: Causal Tracing requires O(L×n) forward passes per token perturbation, which scales poorly.\n",
            "\n",
            "3.4 Future Directions\n",
            "- Automated Circuit Discovery: Use search algorithms (e.g., genetic algorithms) over residual‐path interventions to assemble minimal circuits.  \n",
            "- Sparse Linear Probes: Leverage sparsity‐inducing regularizers to find compact subspaces in residual activations that encode high‐level features.  \n",
            "- Unified Attribution Framework: Integrate attention‐based and residual‐stream analyses into a cohesive causal‐feature map.\n",
            "\n",
            "4. Integration & Curriculum Implications (ARENA Module: Mechanistic Interpretability)\n",
            "- Hands‐On Labs: Implement attention‐rollout and causal tracing in PyTorch; visualize head patterns and residual‐component contributions.  \n",
            "- Case Studies: Analyze a toy LLM to identify a subject–verb agreement circuit via combined attention/residual tracing.  \n",
            "- Research Projects: Extend current tools to long‐context transformers; benchmark causal attribution metrics across model sizes.\n",
            "\n",
            "References:\n",
            "- Abnar & Zuidema (2020), “Quantifying Attention Flow.”\n",
            "- Elhage et al. (2021), “An Overview of Causal Mediation in Transformers.”\n",
            "- Meng et al. (2022), “Locating and Editing Factual Associations in GPT.”\n",
            "- Dalvi et al. (2023), “Emergent Circuits in Vision Transformers.”\n",
            "- Olah et al. (2020), “Zoom In: An Intro to Circuits.”\n",
            "- Vig (2019), “A Multiscale Visualization of Attention in the Transformer.”\n",
            "\u001b[39m\n",
            "======\u001b[33mWorker node 47da34df-026a-461e-bbd5-043cb6af5074 (Dr. Azure Practical (Applied Research Specialist) - Focuses on practical applications, training methodologies, and hands-on implementation guidance) get task arena_research_001.2: 3. (Dr. Azure Practical) Develop practical application guidance and hands‐on ARENA exercises for mechanistic interpretability: outline step‐by‐step implementation methodologies, real‐world deployment considerations, and training protocols for students.\u001b[39m\n",
            "======\n",
            "\u001b[32mReply from Worker node 47da34df-026a-461e-bbd5-043cb6af5074 (Dr. Azure Practical (Applied Research Specialist) - Focuses on practical applications, training methodologies, and hands-on implementation guidance):\u001b[39m\n",
            "\n",
            "\u001b[32mPractical Application Guidance and Hands-On ARENA Exercises for Mechanistic Interpretability\n",
            "\n",
            "1. Step-by-Step Implementation Methodologies\n",
            "\n",
            "Module M1.1: Environment & Model Preparation\n",
            "  • Install Python 3.9+, PyTorch ≥1.10, HuggingFace Transformers, Captum, Ecco library (Petroni et al. 2021).  \n",
            "  • Download a small pretrained model (e.g., GPT-2 small) and a synthetic text dataset (wiki-snippets).\n",
            "  • Build a data loader for tokenized inputs (n≤128 tokens) with attention masks.\n",
            "  \n",
            "Module M1.2: Attention Pattern Analysis Lab\n",
            "  • Compute per-head attention weights A^ℓ=softmax(QKᵀ/√d_k) for all layers ℓ. (ARENA Lab: Attention Rollout)\n",
            "  • Implement “attention rollout” (Abnar & Zuidema 2020) in PyTorch; produce effective attention R^ℓ.\n",
            "  • Visualize heatmaps for key heads (e.g., #0 focusing on copy, #3 on coreference).  \n",
            "  • Head attribution: ablate individual heads by zeroing W_Q,W_K,W_V and record Δ downstream token probability (Wiegreffe & Pinter 2019).\n",
            "  • Deliverable: Jupyter notebook with attention maps, head ranking table, interpretive summary.\n",
            "  \n",
            "Module M1.3: Residual Stream & Circuit Tracing Lab\n",
            "  • Instrument residual activations h^(ℓ) at each block using forward hooks.  \n",
            "  • Perform causal tracing (Meng et al. 2022): replace h^(ℓ) for token “Alice” with baseline embedding; measure Δ logit for “went.”\n",
            "  • Identify multi-layer circuit for subject–verb agreement: track PCA components (Dalvi et al. 2023) that correlate with grammatical number.\n",
            "  • Neuron clamping (Olah et al. 2020): clamp top-k neurons in identified circuit; evaluate effect on model output.\n",
            "  • Deliverable: Circuit diagram, code for hooks, quantitative causality table.\n",
            "\n",
            "Module M1.4: Integrative Project\n",
            "  • Students define a novel interpretable phenomenon (e.g., factual recall).  \n",
            "  • Apply combined attention + residual tracing to isolate the responsible circuit.  \n",
            "  • Present findings in group “journal club” style, including visual artifacts and code snippets.\n",
            "\n",
            "2. Real-World Deployment Considerations\n",
            "\n",
            "Compute & Scalability\n",
            "  • GPU memory profiling: batch vs. sequence length trade-offs for O(n²) attention.  \n",
            "  • Use sparse-attention approximations (Linformer/Performer) to scale to n>1K tokens.\n",
            "\n",
            "Monitoring & Maintenance\n",
            "  • Integrate interpretability metrics into CI/CD: per-commit head-importance stability tests.\n",
            "  • Logging sensitive tokens: redact PII before interpretability analysis to comply with privacy standards.\n",
            "\n",
            "Toolchain Integration\n",
            "  • Deploy Captum-based attribution hooks in production inference pipeline; expose dashboards in Weights & Biases.\n",
            "  • Set up nightly retraining jobs to detect drift in interpretability patterns (e.g., head specialization shift).\n",
            "\n",
            "Governance & Security\n",
            "  • Role-based access: restrict interpretability sandbox access to vetted researchers.\n",
            "  • Audit logs: record all circuit-tracing experiments to enable reproducibility and post hoc review.\n",
            "\n",
            "3. Training Protocols & Evaluation\n",
            "\n",
            "Duration & Structure\n",
            "  • 4-week module: Week 1 (Theory + Env Setup), Week 2 (Attention Labs), Week 3 (Residual Labs), Week 4 (Integrative Project).\n",
            "  • Weekly 2-hour lectures + 3-hour hands-on labs; 1-hour office hours per team.\n",
            "\n",
            "Assessment Criteria\n",
            "  • Code correctness & documentation (40%)\n",
            "  • Interpretability insights and clarity of visualizations (30%)\n",
            "  • Final project report and presentation (30%)\n",
            "\n",
            "Collaboration & Mentorship\n",
            "  • Pair programming for labs; rotate pairs weekly.\n",
            "  • Weekly peer code reviews using GitHub PR templates focused on interpretability best practices.\n",
            "\n",
            "Recommended Reading & Resources\n",
            "  • Abnar & Zuidema (2020), Quantifying Attention Flow\n",
            "  • Meng et al. (2022), Locating & Editing Factual Associations\n",
            "  • Olah et al. (2020), Circuits\n",
            "  • Ecco library tutorials (https://github.com/joppoton/ecco)\n",
            "\n",
            "By following this protocol, students gain hands-on mastery of mechanistic interpretability techniques and understand practical considerations for deploying these methods in production AI systems.\u001b[39m\n",
            "======\u001b[33mWorker node 6518b0c5-99af-4edd-902b-5def9c76ee01 (Prof. Claude Ethics (Ethics & Policy Specialist) - Analyzes ethical implications, policy considerations, and societal impact of AI alignment research) get task arena_research_001.3: 4. (Prof. Claude Ethics) Analyze the ethical and policy implications of mechanistic interpretability research within the ARENA framework: assess societal impacts, propose governance and ethical frameworks, and recommend policy measures for responsible adoption.\u001b[39m\n",
            "======\n",
            "\u001b[32mReply from Worker node 6518b0c5-99af-4edd-902b-5def9c76ee01 (Prof. Claude Ethics (Ethics & Policy Specialist) - Analyzes ethical implications, policy considerations, and societal impact of AI alignment research):\u001b[39m\n",
            "\n",
            "\u001b[32mEthical and Policy Analysis of Mechanistic Interpretability Research\n",
            "\n",
            "1. Societal Impact Assessment\n",
            "\n",
            "1.1 Benefits\n",
            "- Enhanced AI Transparency\n",
            "  • Improved public trust through better model understanding\n",
            "  • Ability to verify alignment with human values\n",
            "  • Early detection of potentially harmful behaviors\n",
            "\n",
            "- Safety Advancement\n",
            "  • Better identification of failure modes\n",
            "  • Increased capability to prevent unintended consequences\n",
            "  • More reliable testing and validation methods\n",
            "\n",
            "1.2 Risks\n",
            "- Dual Use Concerns\n",
            "  • Knowledge of model internals could enable malicious exploitation\n",
            "  • Potential misuse for adversarial attacks\n",
            "  • Privacy vulnerabilities through circuit analysis\n",
            "\n",
            "- Power Concentration\n",
            "  • Technical barriers may limit access to interpretability tools\n",
            "  • Knowledge asymmetry between organizations\n",
            "  • Potential monopolization of safety verification capabilities\n",
            "\n",
            "2. Ethical Framework\n",
            "\n",
            "2.1 Core Principles\n",
            "- Transparency\n",
            "  • Open publication of interpretability methods\n",
            "  • Sharing of tools and datasets\n",
            "  • Clear documentation of limitations\n",
            "\n",
            "- Responsibility\n",
            "  • Mandatory safety assessments before deployment\n",
            "  • Continuous monitoring of model behavior\n",
            "  • Ethical review processes for interpretability research\n",
            "\n",
            "- Equity\n",
            "  • Equal access to interpretability tools\n",
            "  • Diverse stakeholder involvement\n",
            "  • Fair distribution of benefits\n",
            "\n",
            "2.2 Implementation Guidelines\n",
            "- Research Ethics\n",
            "  • Institutional review boards for interpretability studies\n",
            "  • Protected sharing of sensitive findings\n",
            "  • Responsible disclosure protocols\n",
            "\n",
            "- Educational Ethics\n",
            "  • Balanced curriculum covering benefits and risks\n",
            "  • Ethics training for practitioners\n",
            "  • Emphasis on responsible use\n",
            "\n",
            "3. Policy Recommendations\n",
            "\n",
            "3.1 Regulatory Framework\n",
            "- Mandatory Requirements\n",
            "  • Interpretability assessments for high-risk AI systems\n",
            "  • Documentation of model analysis results\n",
            "  • Regular safety audits using interpretability tools\n",
            "\n",
            "- Standards Development\n",
            "  • Technical standards for interpretability methods\n",
            "  • Certification processes for practitioners\n",
            "  • Benchmarks for evaluation\n",
            "\n",
            "3.2 Governance Structures\n",
            "- Oversight Bodies\n",
            "  • International coordination committee\n",
            "  • Multi-stakeholder advisory boards\n",
            "  • Technical working groups\n",
            "\n",
            "- Reporting Mechanisms\n",
            "  • Standardized documentation requirements\n",
            "  • Incident reporting protocols\n",
            "  • Public transparency reports\n",
            "\n",
            "4. Implementation Roadmap\n",
            "\n",
            "4.1 Short-term Actions (0-2 years)\n",
            "- Establish ethical guidelines for ARENA curriculum\n",
            "- Develop preliminary standards\n",
            "- Create stakeholder consultation process\n",
            "\n",
            "4.2 Medium-term Goals (2-5 years)\n",
            "- Implement certification programs\n",
            "- Build international coordination mechanisms\n",
            "- Deploy monitoring frameworks\n",
            "\n",
            "4.3 Long-term Objectives (5+ years)\n",
            "- Achieve global standards harmonization\n",
            "- Establish comprehensive governance framework\n",
            "- Create sustainable oversight mechanisms\n",
            "\n",
            "5. Recommendations for ARENA Curriculum\n",
            "\n",
            "5.1 Educational Components\n",
            "- Ethics Module Integration\n",
            "  • Case studies on interpretability ethics\n",
            "  • Hands-on exercises with ethical considerations\n",
            "  • Discussion of dual-use implications\n",
            "\n",
            "5.2 Policy Training\n",
            "- Regulatory Awareness\n",
            "  • Current and emerging regulations\n",
            "  • Compliance requirements\n",
            "  • Risk assessment frameworks\n",
            "\n",
            "5.3 Professional Development\n",
            "- Ethics Certification\n",
            "  • Professional codes of conduct\n",
            "  • Responsibility training\n",
            "  • Impact assessment methods\n",
            "\n",
            "Conclusion:\n",
            "Mechanistic interpretability research requires careful balance between technical advancement and ethical considerations. The ARENA framework must emphasize responsible development while ensuring equitable access and robust safety measures. Success depends on strong governance structures and continuous stakeholder engagement.\u001b[39m\n",
            "======\n",
            "============================================================\n",
            "📊 RESEARCH RESULTS\n",
            "============================================================\n",
            "Mechanistic Interpretability in Large Language Models  \n",
            "Comprehensive Collaborative Research Analysis for the ARENA AI Alignment Curriculum  \n",
            "\n",
            "1. Key Concepts & Relationships  \n",
            "   • Feature Visualization – visualizing individual neurons, attention heads or MLP features to map activations to human‐readable concepts.  \n",
            "   • Circuit Analysis – identifying “circuits” (chains of neurons/heads across layers) that implement specific computations (e.g., subject–verb agreement).  \n",
            "   • Attribution Methods – quantifying which inputs or internal components (tokens, heads, neurons) causally influence particular outputs via interventions, ablations or tracing.  \n",
            "\n",
            "   Alignment Strategy Mapping  \n",
            "   • Transparency – interpretability tools reveal how decisions arise and expose biases or failure modes.  \n",
            "   • Safety Monitoring – derive metrics from internal activations to detect drift or undesirable behavior.  \n",
            "   • Iterative Refinement – use mechanistic insights to adjust architectures, training objectives (including RLHF) and safety constraints.  \n",
            "\n",
            "2. Technical Analysis  \n",
            "   2.1 Attention‐Pattern Analysis  \n",
            "     – Mathematical foundation: Q= XW_Q, K= XW_K, V= XW_V; A = softmax(QKᵀ/√d_k); O = A·V; multi‐head decomposition.  \n",
            "     – Algorithms: attention rollout (effective attention over layers), attention flow (max‐flow/min‐cut), head ablation for causal ranking.  \n",
            "     – Limitations: attention ≠ causation, noisy/sparse heads, O(n²) cost.  \n",
            "     – Future directions: causal mediation analysis, sparse‐attention probes, automated head‐role clustering.  \n",
            "\n",
            "   2.2 Residual‐Stream Analysis  \n",
            "     – Mathematics: residual updates h֊ = h + Attention(h); h_next = h֊ + MLP(h֊); linearized Taylor‐path decompositions.  \n",
            "     – Algorithms: causal tracing (intervene on residuals), PCA‐based residual probing, neuron clamping.  \n",
            "     – Limitations: nonlinearity, combinatorial layer‐interactions, high compute overhead.  \n",
            "     – Future directions: automated circuit search (e.g., genetic algorithms), sparse linear probes, unified causal‐feature mapping.  \n",
            "\n",
            "3. Practical Applications & ARENA Exercises  \n",
            "   3.1 Module M1.1: Environment & Model Prep  \n",
            "     • Install PyTorch, Transformers, Captum, Ecco; load GPT-2 small; tokenization pipeline.  \n",
            "   3.2 Module M1.2: Attention Labs  \n",
            "     • Compute A^ℓ; implement rollout; visualize head heatmaps; ablation attribution.  \n",
            "   3.3 Module M1.3: Residual & Circuit Tracing  \n",
            "     • Hook residual streams; perform causal tracing on tokens; identify PCA components; clamp neurons.  \n",
            "   3.4 Module M1.4: Integrative Project  \n",
            "     • Students select a phenomenon (e.g., factual recall), apply combined methods, present findings.  \n",
            "\n",
            "   Real-World Deployment  \n",
            "     • Scale via sparse attention; CI/CD integration for interpretability metrics; privacy-compliant data handling; dashboarding (Weights & Biases).  \n",
            "     • Governance: role‐based sandbox access, audit logs of experiments.  \n",
            "\n",
            "   Training Protocol  \n",
            "     • 4-week module: theory, attention lab, residual lab, project; lectures + hands-on labs; assessments on code quality, insights, presentations.  \n",
            "\n",
            "4. Ethical & Policy Considerations  \n",
            "   4.1 Societal Impacts  \n",
            "     – Benefits: deeper transparency, early failure detection, safer deployments.  \n",
            "     – Risks: dual‐use (adversarial exploitation), privacy leakage, power concentration.  \n",
            "   4.2 Ethical Framework  \n",
            "     – Principles: Transparency, Responsibility, Equity.  \n",
            "     – Guidelines: IRB review, responsible disclosure, ethics training in curriculum.  \n",
            "   4.3 Policy Recommendations  \n",
            "     – Require interpretability assessments for high-risk systems; develop technical standards and certification for practitioners; establish multi-stakeholder oversight bodies; standardized incident reporting.  \n",
            "   4.4 Implementation Roadmap  \n",
            "     – Short-term (0–2 yrs): ethics guidelines, preliminary standards, stakeholder consultations.  \n",
            "     – Medium (2–5 yrs): certification programs, international coordination, monitoring frameworks.  \n",
            "     – Long-term (>5 yrs): harmonized global standards, sustainable oversight.  \n",
            "\n",
            "5. Synthesis & Recommendations  \n",
            "   5.1 How these techniques illuminate LLM behavior  \n",
            "     – Attention and residual analyses expose which tokens, heads and neurons implement linguistic features, factual recall, reasoning steps.  \n",
            "   5.2 Informing AI Alignment  \n",
            "     – By mapping internal circuits to semantic functions, we can detect misaligned “subroutines” and intervene (e.g., fine-tune or edit circuits).  \n",
            "   5.3 Limitations & Future Directions  \n",
            "     – Scaling to ultra-long contexts; bridging correlation→causation; automating circuit discovery; integrating across attribution modalities.  \n",
            "   5.4 Curriculum & Research Priorities  \n",
            "     – Embed hands-on labs on causal mediation and circuit‐editing; benchmark interpretability across model sizes; explore interpretability’s role in governance and adversarial robustness; develop standardized evaluation metrics.  \n",
            "\n",
            "Conclusion  \n",
            "Mechanistic interpretability provides a concrete bridge between black-box LLMs and human‐readable computational structure, enabling transparency, safety monitoring and alignment interventions. The ARENA curriculum should leverage these methods in practical labs, embed ethical governance training, and drive research into scalable, causal, and standardized interpretability techniques for robust, aligned future AI systems.\n"
          ]
        }
      ],
      "source": [
        "\"\"\"Demonstrating the ARENA Research Society\"\"\"\n",
        "society = ARENAResearchSociety()\n",
        "\n",
        "# Example research topics related to ARENA curriculum\n",
        "sample_topics = {\n",
        "        1: {\n",
        "            \"topic\": \"Mechanistic Interpretability in Large Language Models\",\n",
        "            \"questions\": \"\"\"\n",
        "            - How do the latest mechanistic interpretability techniques apply to understanding LLM behavior?\n",
        "            - What are the most effective methods for interpreting attention patterns and residual streams?\n",
        "            - How can mechanistic interpretability inform AI alignment strategies?\n",
        "            - What are the current limitations and future directions in this field?\n",
        "            \"\"\"\n",
        "        },\n",
        "        2: {\n",
        "            \"topic\": \"RLHF Implementation Challenges and Best Practices\",\n",
        "            \"questions\": \"\"\"\n",
        "            - What are the main technical challenges in implementing RLHF at scale?\n",
        "            - How do different reward modeling approaches compare in effectiveness?\n",
        "            - What are the alignment implications of various RLHF techniques?\n",
        "            - How can we address issues like reward hacking and distributional shift?\n",
        "            \"\"\"\n",
        "        },\n",
        "        3: {\n",
        "            \"topic\": \"AI Governance Frameworks for Emerging Technologies\",\n",
        "            \"questions\": \"\"\"\n",
        "            - What governance frameworks are most suitable for rapidly advancing AI capabilities?\n",
        "            - How can policy makers balance innovation with safety considerations?\n",
        "            - What role should technical AI safety research play in policy development?\n",
        "            - How can international coordination on AI governance be improved?\n",
        "            \"\"\"\n",
        "        }\n",
        "    }\n",
        "\n",
        "print(\"🎯 ARENA AI Alignment Research Society\")\n",
        "print(\"Choose a research topic or provide your own:\")\n",
        "print()\n",
        "\n",
        "for num, info in sample_topics.items():\n",
        "        print(f\"{num}. {info['topic']}\")\n",
        "print(\"4. Custom research topic\")\n",
        "print()\n",
        "\n",
        "try:\n",
        "      choice = input(\"Enter your choice (1-4): \").strip()\n",
        "\n",
        "      if choice in ['1', '2', '3']:\n",
        "            topic_info = sample_topics[int(choice)]\n",
        "            result = society.run_research(\n",
        "                topic_info[\"topic\"],\n",
        "                topic_info[\"questions\"]\n",
        "            )\n",
        "      elif choice == '4':\n",
        "            custom_topic = input(\"Enter your research topic: \").strip()\n",
        "            custom_questions = input(\"Enter specific questions (optional): \").strip()\n",
        "            result = society.run_research(\n",
        "                custom_topic,\n",
        "                custom_questions if custom_questions else None\n",
        "            )\n",
        "      else:\n",
        "            print(\"Invalid choice. Running default research...\")\n",
        "            result = society.run_research(sample_topics[1][\"topic\"], sample_topics[1][\"questions\"])\n",
        "\n",
        "except KeyboardInterrupt:\n",
        "        print(\"\\n👋 Research session interrupted.\")\n",
        "except Exception as e:\n",
        "        print(f\"❌ Error during research: {e}\")\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tIjwx8qaA16N"
      },
      "source": [
        "## What this does:\n",
        "\n",
        "- Provides pre-defined research topics relevant to ARENA curriculum\n",
        "- Offers custom topic input for flexible research\n",
        "- Handles user interaction gracefully with error handling\n",
        "- Demonstrates the full capabilities of the collaborative AI society"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pJ101gxnA4fY"
      },
      "source": [
        "## 🚀 Running the Cookbook\n",
        "To run this collaborative AI research society:\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6cO2bYKQBoHS"
      },
      "source": [
        "Execute Individual cells."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YyCoe0zXBvVY"
      },
      "source": [
        "Follow prompts: Enter your API credentials and select research topics"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3iQSGaVhBxLR"
      },
      "source": [
        "The system will create a collaborative research environment where Claude and Azure OpenAI agents work together to produce comprehensive analysis on AI alignment topics!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wMYMfaZ9Bzdl"
      },
      "source": [
        "## 🎯 Conclusion\n",
        "The future of AI collaboration is here, and this CAMEL-powered society demonstrates the incredible potential of multi-agent systems working across different AI platforms.\n",
        "\n",
        "In this cookbook, you've learned how to:\n",
        "\n",
        "- Build cross-platform AI collaboration between Claude 4 and Azure OpenAI models\n",
        "- Create specialized AI researchers with distinct personas and expertise areas\n",
        "- Implement robust workforce management using CAMEL's advanced orchestration\n",
        "- Handle complex API configurations for multiple AI providers seamlessly\n",
        "- Design structured research workflows for AI alignment and safety topics\n",
        "- Create scalable agent societies that can tackle complex, multi-faceted problems\n",
        "\n",
        "This collaborative approach showcases how different AI models can complement each other - Claude's nuanced reasoning and ethical considerations paired with Azure OpenAI's technical precision creates a powerful research dynamic. The ARENA AI alignment focus demonstrates how these societies can be specialized for cutting-edge domains like mechanistic interpretability, RLHF, and AI governance.\n",
        "\n",
        "As the field of multi-agent AI systems continues to evolve, frameworks like CAMEL are paving the way for increasingly sophisticated collaborations. Whether you're researching AI safety, exploring complex technical topics, or building specialized knowledge teams, the patterns and techniques in this cookbook provide a solid foundation for the next generation of AI-powered research.\n",
        "\n",
        "The possibilities are endless when AI agents work together. Keep experimenting, keep collaborating, and keep pushing the boundaries of what's possible.\n",
        "\n",
        "Happy researching! 🔬✨\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PxVgp-zXB9Hj"
      },
      "source": [
        "That's everything: Got questions about 🐫 CAMEL-AI? Join us on [Discord](https://discord.camel-ai.org)! Whether you want to share feedback, explore the latest in multi-agent systems, get support, or connect with others on exciting projects, we’d love to have you in the community! 🤝\n",
        "\n",
        "Check out some of our other work:\n",
        "\n",
        "1. 🐫 Creating Your First CAMEL Agent [free Colab](https://docs.camel-ai.org/cookbooks/create_your_first_agent.html)\n",
        "\n",
        "2.  Graph RAG Cookbook [free Colab](https://colab.research.google.com/drive/1uZKQSuu0qW6ukkuSv9TukLB9bVaS1H0U?usp=sharing)\n",
        "\n",
        "3. 🧑‍⚖️ Create A Hackathon Judge Committee with Workforce [free Colab](https://colab.research.google.com/drive/18ajYUMfwDx3WyrjHow3EvUMpKQDcrLtr?usp=sharing)\n",
        "\n",
        "4. 🔥 3 ways to ingest data from websites with Firecrawl & CAMEL [free Colab](https://colab.research.google.com/drive/1lOmM3VmgR1hLwDKdeLGFve_75RFW0R9I?usp=sharing)\n",
        "\n",
        "5. 🦥 Agentic SFT Data Generation with CAMEL and Mistral Models, Fine-Tuned with Unsloth [free Colab](https://colab.research.google.com/drive/1lYgArBw7ARVPSpdwgKLYnp_NEXiNDOd-?usp=sharingg)\n",
        "\n",
        "Thanks from everyone at 🐫 CAMEL-AI\n",
        "\n",
        "\n",
        "<div class=\"align-center\">\n",
        "  <a href=\"https://www.camel-ai.org/\"><img src=\"https://i.postimg.cc/KzQ5rfBC/button.png\"width=\"150\"></a>\n",
        "  <a href=\"https://discord.camel-ai.org\"><img src=\"https://i.postimg.cc/L4wPdG9N/join-2.png\"  width=\"150\"></a></a>\n",
        "  \n",
        "⭐ <i>Star us on [*Github*](https://github.com/camel-ai/camel), join our [*Discord*](https://discord.camel-ai.org) or follow our [*X*](https://x.com/camelaiorg)\n",
        "</div>\n"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
